Grammatical Induction and Recognition of the Documentary Form of Records
نویسندگان
چکیده
This paper presents digital curators with a more precise understanding of the concept of documentary form, and how documentary form can be automatically learned from a sample of records of a particular document type. The ability to automatically recognize documentary form enables item description. Item description enables file unit description and this enables automatic series description. This technology can reduce the effort required of an appraisal archivist to assess the value of record series containing a large number of e-records of different documentary forms. It can also provide archivists with earlier intellectual control of accessioned e-record series by providing preliminary scope and content notes for these series. Item descriptions provide additional ways for indexing and searching collections of records. Introduction Among the challenges archivists face in appraising e-records and gaining intellectual control of accessioned e-records is the enormous volume of records and the time it requires to read and understand the content of these records. According to one source, "the Clinton White House generated 38 million e-mail messages (and the current Bush White House is expected to generate triple that number)." [3] Archivists must review presidential records page-by page before they can be disclosed to the public or it is determined that here are restrictions on disclosure. Data collected on declassification review, indicates that a reviewer can review on average one page per minute, or 60 pages per hour. Given 1920 work hours per year, an archivist doing nothing other than review, could be expected on average to review 115,000 pages per year. NARA provides eight archivists to each Presidential Library, one of which is a Supervisory Archivist. Assuming seven archivists reviewing records, and an email with attachments averaging one page in length, they could review about 800,000 email massages per year. It will take 125 years for Presidential Library archivists to review and describe the Bush Administration's email for the first time. In the next section, a method is described for recognizing the documentary form of records created by office applications such as word processors, spreadsheets and database management systems. Then it is shown how the ability to automatically recognize document type enables the automatic description of items, file units and record series. Finally, how these technologies can aid archivists in appraising e-records and gaining intellectual control of accessioned e-records is discussed. .
منابع مشابه
Grammar-Based Recognition of Documentary Forms and Extraction of Metadata
Metadata extraction is a critical aspect of ingestion of collections into digital archives and libraries. A method for automatically recognizing document types and extracting metadata from digital records has been developed. The method is based on a method for automatically annotating semantic categories such as person’s names, job titles, dates, and postal addresses that may occur in a record....
متن کاملPlanned Focus-on-form Instruction in Task-based Language Teaching: The case of EFL learners’ oral grammatical accuracy performance
This study investigated the effects of planned focus-on-form instruction (pFFI) on developing oral grammatical accuracy in Iranian English EFL learners. To this end, 60 lower-intermediate EFL learners studying English in a private English language institute in Tehran, Iran, were randomly assigned to two classes. Both classes received a task-based instruction on grammatical points elicited in or...
متن کاملTypes of Grammatical Metaphors in Harry Potter and the Prisoner of Azkaban
Grammatical Metaphor (GM) is one of the fresh language phenomena introduced by Halliday (1985) in the framework of functional grammar. Thompson (2004) states that the salient source of GM would be ‘Nominalization’ where a noun form attempts to represent a verb form or in other words, a verb form with its different process is represented in a noun form. He continues that any wording is ought to ...
متن کاملLevel of Grammatical Proficiency and Acquisition of Functional Projections: The case of Iranian learners of English language
Unlike Lexical Projections, Functional Projections (Extended Projections) are more of an ‘abstract’ in nature. Therefore, Functional Projections seem to be acquired later than Lexical Projections by the L2 learners. The present study investigates Iranian L2 learners’ acquisition of English Extended Projections taking into account their level of grammatical proficiency. Specifically, the aim is ...
متن کاملبازشناسی متون فارسی با استفاده از مدل زبانی n-gram و پالایش گرامری
Abstract Text recognition has been one of the growing research topics in recent years. Many of these researches have focused on recognition of letters and sub-words as a basis for identifying larger text structures such as words, phrases and sentences. This thesis presents a new method in which the recognized sub-words are combined in order to provide meaningful words and sentences in Farsi tex...
متن کامل